An Evaluation of Generic Bulk Loading Techniques
نویسندگان
چکیده
Bulk loading refers to the process of creating an index from scratch for a given data set. This problem is well understood for B-trees, but so far, non-traditional index structures received modest attention. We are particularly interested in fast generic bulk loading techniques whose implementations only employ a small interface that is satisfied by a broad class of index structures. Generic techniques are very attractive to extensible database systems since different user-implemented index structures implementing that small interface can be bulk-loaded without any modification of the generic code. The main contribution of the paper is the proposal of two new generic and conceptually simple bulk loading algorithms. These algorithms recursively partition the input by using a main-memory index of the same type as the target index to be build. In contrast to previous generic bulk loading algorithms, the implementation of our new algorithms turns out to be much easier. Another advantage is that our new algorithms possess fewer parameters whose settings have to be taken into consideration. An experimental performance comparison is presented where different bulk loading algorithms are investigated in a system-like scenario. Our experiments are unique in the sense that we examine the same code for different index structures (R-tree and Slim-tree). The results consistently indicate that our new algorithms outperform asymptotically worst-case optimal competitors. Moreover, the search quality of the target index will be better when our new bulk loading algorithms are used.
منابع مشابه
Evaluation of Effectiveness of Main Factors on the Reduction of Loading and Discharging Performance Versus Loading and Discharging Rate of Dry Bulk Terminal (Case Study of Imam Khomeini Port)
The aim of this article is to measure the impact of main factors affecting the reduction of discharge and loading performance compared to dry bulk discharge and loading in terminal of Imam Khomeini Port. For this purpose, the actual data presented in Imam Khomeini Port for discharging and loading statistics and library documented data were used. In order to answer the research questions, multip...
متن کاملSpace-Partitioning-Based Bulk-Loading for the NSP-Tree in Non-ordered Discrete Data Spaces
Properly-designed bulk-loading techniques are more efficient than the conventional tuple-loading method in constructing a multidimensional index tree for a large data set. Although a number of bulkloading algorithms have been proposed in the literature, most of them were designed for continuous data spaces (CDS) and cannot be directly applied to non-ordered discrete data spaces (NDDS). In this ...
متن کاملA Generic Approach to Bulk Loading Multidimensional Index Structures
Recently there has been an increasing interest in supporting bulk operations on multidimensional index structures. Bulk loading refers to the process of creating an initial index structure for a presumably very large data set. In this paper, we present a generic algorithm for bulk loading which is applicable to a broad class of index structures. Our approach differs completely from previous one...
متن کاملThe Bulk Index Join: A Generic Approach to Processing Non-Equijoins
Efficient join algorithms have been developed for processing different types of non-equijoins like spatial join, band join, temporal join or similarity join. Each of these previously proposed join algorithms is tailor-cut for a specific type of join, and a generalization of these algorithms to other join types is not obvious. We present an efficient algorithm called bulk index join that can be ...
متن کاملParallel bulk-loading of spatial data
Spatial database systems have been introduced in order to support non-traditional data types and more complex queries. Although bulk-loading techniques for access methods have been studied in the spatial database literature, parallel bulk-loading has not been addressed in a parallel spatial database context. Therefore, we study the problem of parallel bulk-loading, assuming that an R-tree like ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001